1 Introduction

ClinicalTrials.gov was first released in 2000. As of March 2019, ClinicalTrials.gov includes 300,676 research studies in all 50 states and in 208 countries. The the CTTI AACT project and database provides a harmonizing schema and convenient access. However, there remain major challenges to knowledge discovery using these data, such as lack of standard terminology. To address this, for the use case of elucidating drug target hypotheses, we have used state of the art domain specialized text mining with synonym resolution for specific classes of entities: (1) chemicals and (2) diseases. Chemicals are identified and resolved using NextMove Leadmine. Diseases, indications and other phenotypic terms are mined via JensenLab Tagger with Disease Ontology dictionary, and NLM supplied MeSH terms. Protein targets are associated via ChEMBL bioactivities on molecular structure cross-referencing. Another fundamental challenge is to assess confidence of inferences from noisy and disparate data. We propose a scoring system for assessing confidence for target hypotheses inferred from aggregated clinical trials, with emphasis on higher confidence, novel predictions with the potential to illuminate the understudied druggable genome. (Paragraph excerpted from manuscript draft.)

1.1 Issues

  • Prior belief is that target NER is not likely to be useful, since clinical trials descriptive text is not generally written to communicate molecular mechanisms to research scientists, but with focus on clinical efficacy and safety. In due diligence we perform target NER, and to quantify concordance or refutation with our prior belief, we also perform target NER on arbitrary non-biomedical text, tweets from the Twitter API for #brexit (26 Nov 2019). We find that 8.64 target entities per 1000 chars in the tweets, vs. 6.63 in the clinical trials descriptions. While not proof this does support our belief and less direct method via chemical NER.

1.2 Identifier mappings:

NCT_ID →(JensenLab:Tagger)→ DOID
NCT_ID →(AACT)→ MeSH
NCT_ID →(NextMove:LeadMine)→ SMILES
SMILES →(PubChem)→ CID
CID →(PubChem)→ INCHIKEY
INCHIKEY →(ChEMBL)→ MOLECULE_CHEMBL_ID
MOLECULE_CHEMBL_ID →(ChEMBL)→ ACTIVITY_ID
ACTIVITY_ID →(ChEMBL)→ TARGET_CHEMBL_ID
TARGET_CHEMBL_ID →(ChEMBL)→ COMPONENT_ID
COMPONENT_ID →(ChEMBL)→ UNIPROT
ACTIVITY_ID →(ChEMBL)→ DOCUMENT_CHEMBL_ID
DOCUMENT_CHEMBL_ID →(ChEMBL)→ PUBMED_ID

1.3 Input files:

  • (CTTI AACT) aact_studies.tsv
  • (CTTI AACT) aact_drugs.tsv
  • (CTTI AACT) aact_descriptions.tsv
  • (NextMove LeadMine) aact_drugs_leadmine.tsv
  • (PubChem) aact_drugs_smi_pubchem_cid.tsv
  • (PubChem) aact_drugs_smi_pubchem_cid2ink.tsv
  • (ChEMBL) aact_drugs_ink2chembl.tsv
  • (ChEMBL) aact_drugs_chembl_activity.tsv
  • (ChEMBL) aact_drugs_chembl_target_component.tsv
  • (ChEMBL) aact_drugs_chembl_document.tsv
  • (IDG TCRD/Pharos) pharos_targets.tsv
  • (JensenLab Tagger) aact_descriptions_tagger_disease_matches.tsv
  • (JensenLab Dictionary) diseases_entities.tsv

nct_id is the study ID.

## [1] "Mon Dec  2 09:57:13 2019"
library(readr)
library(data.table)
library(tm)
library(stringr)
library(plotly, quietly=T)
## Warning: As of rlang 0.4.0, dplyr must be at least version 0.8.0.
## ✖ dplyr 0.7.8 is too old for rlang 0.4.1.
## ℹ Please update dplyr with `install.packages("dplyr")`.

2 Input studies and drugs

2.1 Studies

Read file of all studies in AACT.

## [1] "Total studies: 300214 ; unique NCT_IDs: 300214"

2.1.1 Study references

Reference type results_reference may offer greater evidence, confidence.

## [1] "references: 388031; NCT_IDs: 61208; PMIDs: 287758; results_references: 64880"

2.2 Drugs

Read file of all drugs in AACT.

  • id is AACT INTERVENTION_ID, corresponding with an instance of a drug, dose, delivery, etc. in a study.
  • Note that one study may involve multiple drugs.
  • At this point a “drug” is imprecisely identified by name, generally one of many synonyms.
## [1] "Unique drug names: 91347 ; unique intervention IDs: 255077"

2.3 Studies: Interventional drug studies only

Select only Interventional studies (study_type) associated with drugs (via NCT_ID).

## [1] "Interventional studies: 237892 (79.2%)"
## [1] "Interventional drug studies: 124421 ; unique NCT_IDs: 124421"
Drug studies and drugs, by phase
phase N_studies N_drugs
Early Phase 1 1574 2615
Phase 1 23603 48593
Phase 1/Phase 2 6663 13288
Phase 2 33910 68850
Phase 2/Phase 3 3305 6503
Phase 3 22988 49507
Phase 4 19593 36331
NA 12785 29390
Drug studies and drugs, by overall_status
overall_status N_studies N_drugs
Active, not recruiting 6420 13962
Completed 72053 145006
Enrolling by invitation 638 1060
Not yet recruiting 4138 8001
Recruiting 16723 33973
Suspended 463 945
Terminated 10138 19618
Unknown status 10106 18463
Withdrawn 3742 6969

2.4 Drug studies by Phase and Status

2.5 Drug studies and drugs by start_year

## Warning: Ignoring 1 observations

## Warning: Ignoring 1 observations

3 NextMove Leadmine Chemical NER

AACT drug names resolved to standard names and structures via SMILES. Note that one name may include multiple chemicals. Now we can use cheminformatically rigorous counts for drugs as active pharmaceutical ingredients (APIs).

## [1] "Drug unique SMILES resolved by LeadMine: 4699 ; unique intervention IDs: 171741"

3.1 Chemical NER mentions

3.1.1 Totals by merging of synonyms to resolved structure (locally canonical SMILES)

Top 20 drugs by total mentions
smi2img N_mentions names
2637 Abraxane; PACLITAXEL; Paclitaxel; Taxol; abraxane; paclitaxel; taxol
2545 CYCLOPHOSPHAMIDE; Ciclophosphamide; Cyclophosphamid; Cyclophosphamide; ciclophosphamide; cyclophosphamide
2461 CISPLATIN; Cis-platinum; Cisplatin; Cisplatine; Cisplatinum; cis Platinum; cis-platinum; cisplatin; cisplatine; cisplatinum
2070 DEXAMETHASONE; Dexamethason; Dexamethasone; Dexamethosone; Maxitrol; OZURDEX; Oradexon; Ozurdex; dexamethason; dexamethasone; dexamethosone
2054 CARBOPLATIN; Carboplatin; Carboplatine; Paraplatin; carboplatin; carboplatine
1779 DOCETAXEL; Docetaxel; docetaxel
1625 METFORMIN; MetFORMIN; Metformin; Metformine; metformin; metformine
1540 GEMCITABINE; Gemcitabine; gemcitabine
1342 CAPECITABINE; Capecitabin; Capecitabine; XELODA; Xeloda; capecitabine; xeloda
1178 Cortancyl; Lodotra; Meticorten; Prednison; Prednisone; RAYOS; prednison; prednisone
1157 0xaliplatin; Eloxatin; OXALIPLATIN; OXAliplatin; Oxaliplatin; Oxaliplatine; eloxatin; oxaliplatin; oxaliplatine
1157 METHOTREXATE; Methotrexate; Metoject; methotrexate
1086 BUPIVACAINE; Bupivacain; Bupivacaine; EXPAREL; Exparel; SKY0402; bupivacain; bupivacaine
1044 ETOPOSIDE; Etoposid; Etoposide; etoposide
1027 ADOPORT; ADVAGRAF; Adoport; Advagraf; ENVARSUS; Envarsus; FK-506; FK506; PROGRAF; Prograf; Protopic; TACROLIMUS; Tacrolimus; tacrolimus
978 NORMAL SALINE; Normal Saline; Normal saline; normal salin; normal saline
977 LIDOCAINE; LMX 4; LMX4; Lidocain; Lidocaine; Lidoderm; Lignocain; Lignocaine; Oraqix; lidocain; lidocaine; lignocaine
908 CYTARABINE; Cytarabine; Cytosar; DepoCyt; DepoCyte; Depocyt; Depocyte; cytarabine; cytosar
903 COPEGUS; Copegus; REBETOL; RIBAVIRIN; Rebetol; Ribasphere; Ribavarin; Ribavirin; Ribavirine; Virazole; rebetol; ribavarin; ribavirin
846 Diprivan; PROPOFOL; Propofol; propofol

3.1.2 Chemical NER mentions resolved to structures (SMILES)

## [1] "Drugs (drug names) with resolved structure: 180555 / 197300 (91.5%)"

3.1.3 Chemical NER mentions by intervention ID.

## [1] "Mentions by intervention ID: 157862 / 171741 (91.9%)"

3.1.4 Chemical NER mentions by trial (NCT ID).

## [1] "Mentions by study: 92966 / 99647 (93.3%)"

3.1.5 Chemical NER mentions by drug, i.e. name in AACT.

## [1] "Mentions by drug name: 11108 / 58297 (19.1%)"

4 PubChem:

4.1 Intervention IDs to CIDs from PubChem (via SMILES)

## [1] "PubChem SMILES2CID hits: 3933 / 4540 (86.6%)"
## [1] "Intervention IDs mapped to PubChem CIDs (via SMILES): 153342"

4.2 InChIKeys from PubChem (via CIDs)

## [1] "PubChem CIDs with InChIKeys: 3783"

5 IDG/TCRD:

For Target Development Level (TDL) and other metadata.

6 ChEMBL:

6.1 ChEMBL molecule IDs, and properties (via InChIKeys)

Perhaps should instead use PubChem CIDs and UniChem.

## [1] "ChEMBL compounds mapped via InChIKeys: 3316"

6.2 ChEMBL activities for mapped compounds

Select only activities with pChembl values for relevance to protein targets and confidence.

## [1] "ChEMBL activities: 127943"
## [1] "ChEMBL activities molecules: 2302 ; canonical_smiles: 2302 ; targets: 3877 ; documents: 16959"

6.2.1 Activity and molecule counts by assay types

Activity and molecule counts by assay types
assay_type N_molecule N_activity
F:Functional 1828 73811
B:Binding 1831 49891
A:ADMET 759 4058
P:Physicochemical 44 120
T:Toxicity 28 59
U:Unclassified 3 4

6.3 ChEMBL targets (via activities)

## [1] "ChEMBL target proteins: 3157"
## [1] "ChEMBL target proteins mapped to TCRD (human): 1805"

6.4 ChEMBL targets by organism:

## [1] "Organisms: 187"
Targets by organism (top 10)
organism N_targets Types
Homo sapiens 1806 CHIMERIC PROTEIN; PROTEIN COMPLEX; PROTEIN COMPLEX GROUP; PROTEIN FAMILY; PROTEIN-PROTEIN INTERACTION; SELECTIVITY GROUP; SINGLE PROTEIN
Rattus norvegicus 529 PROTEIN COMPLEX; PROTEIN COMPLEX GROUP; PROTEIN FAMILY; SELECTIVITY GROUP; SINGLE PROTEIN
Mus musculus 238 CHIMERIC PROTEIN; PROTEIN COMPLEX; PROTEIN COMPLEX GROUP; PROTEIN FAMILY; SINGLE PROTEIN
Bos taurus 98 PROTEIN COMPLEX; PROTEIN COMPLEX GROUP; PROTEIN FAMILY; SINGLE PROTEIN
Sus scrofa 36 PROTEIN COMPLEX; PROTEIN FAMILY; SINGLE PROTEIN
Cavia porcellus 26 SINGLE PROTEIN
Escherichia coli K-12 19 PROTEIN COMPLEX; PROTEIN FAMILY; SINGLE PROTEIN
Oryctolagus cuniculus 18 SINGLE PROTEIN
Escherichia coli 17 PROTEIN COMPLEX; SINGLE PROTEIN
Mycobacterium tuberculosis 17 SINGLE PROTEIN

6.5 Human single-protein targets only, by IDG family.

## [1] "Human targets: 1806"
idgFamily N
Kinase 405
Enzyme 330
GPCR 158
None 120
IC 64
Transporter 53
Epigenetic 35
NR 28
TF 20
TF; Epigenetic 3
## [1] "Human single-protein targets: 1216 ; unique UniProts: 1216"

6.6 ChEMBL targets by IDG TDL:

## [1] "   Tchem:    767" "   Tclin:    342" "    Tbio:    105"
## [4] "   Tdark:      2"

7 JensenLab Tagger Diseases NER

With JensenLab DOID entities dictionary. On descriptions from detailed_descriptions table.

Likely false positives, manually removed:

## [1] "Total disease mentions: 497207 (in 124421 studies)"

7.1 Disease mention totals by merging to resolved Disease Ontology term (DOID).

Top 20 diseases by total mentions
doid N_mentions terms
DOID:162 28596 CANCER; CANcer; Cancer; Malignant Tumor; Malignant neoplasm; Malignant tumor; Primary Cancer; Primary cancer; cancer; malignant Tumor; malignant neoplasm; malignant tumor; primary cancer
DOID:9351 17274 DIABETES; DIABETES MELLITUS; DIAbetes; DIabetes; Diabetes; Diabetes Mellitus; Diabetes mellitus; diabetes; diabetes Mellitus; diabetes mellitus; diabetes-mellitus
DOID:6713 16632 CVA; Cerebrovascular Accident; Cerebrovascular Disease; Cerebrovascular accident; Cerebrovascular disease; STROKE; STRokE; Stroke; cerebro- vascular disease; cerebro-vascular disease; cerebrovascul…
DOID:2030 12084 ANXIETY; Anxiety; Anxiety Disorder; Anxiety state; anxiety; anxiety disorder; anxiety state; anxiety syndrome; anxiety-state
DOID:1612 10583 BREAST CANCER; BReast CAncer; BReast Cancer; Breast Cancer; Breast cancer; Breast tumor; Breast-cancer; Primary breast cancer; breast Cancer; breast caNcEr; breast cancer; breast tumor; breast-canc…
DOID:2841 10021 ASTHMA; Asthma; BHR; Bronchial hyper-reactivity; Bronchial hyperreactivity; EIA; Exercise-induced asthma; asthma; bronchial hyper reactivity; bronchial hyper-reactivity; bronchial hyperreactivity; …
DOID:3083 9782 CHRONIC OBSTRUCTIVE PULMONARY DISEASE; COLD; COPD; COPd; Chronic Obstructive Lung Disease; Chronic Obstructive Lung disease; Chronic Obstructive Pulmonary Disease; Chronic Obstructive Pulmonary dis…
DOID:9970 9303 OBESITY; OBesity; Obesity; obEsity; obe-sity; obesity
DOID:10763 9144 HBP; HTN; HYPERTENSION; High Blood Pressure; High blood pressure; High-blood pressure; Hypertension; Hypertensive disease; high blood Pressure; high blood pressure; high blood-pressure; htn; hyper-…
DOID:3393 6816 C-HD; CAD; CHD; CORONARY ARTERY DISEASE; CORONARY SYNDROME; CORONARY syndrome; ChD; Coronary ARtery DIsease; Coronary Artery Disease; Coronary Disease; Coronary Heart Disease; Coronary Heart diseas…
DOID:0060145 6115 ANALGESIA; Analgesia; analgeSia; analgesia
DOID:9352 5848 Diabetes Mellitus Type 2; Diabetes Mellitus Type II; Diabetes Mellitus type 2; Diabetes Mellitus, Type II; Diabetes mellitus Type 2; Diabetes mellitus non-insulin-dependent; Diabetes mellitus type …
DOID:10283 5056 Familial Prostate Cancer; HPC; PRostate Cancer; Prostate CAncer; Prostate Cancer; Prostate cancer; Prostatic cancer; hereditary prostate cancer; prostate Cancer; prostate cancer; prostate-cancer; p…
DOID:8469 4985 FLU; Flu; Influenza; flu; influenza
DOID:225 4962 SYNDROME; Syndrome; syn drome; syndrome
DOID:3908 4959 NSCLC; Non Small Cell Lung Cancer; Non Small Cell Lung Carcinoma; Non Small Cell Lung cancer; Non small cell lung cancer; Non small-cell lung cancer; Non- small cell lung cancer; Non-Small Cell Lun…
DOID:784 4841 CKD; CKF; CRD; CRF; Chronic Kidney Disease; Chronic Kidney disease; Chronic Kidney failure; Chronic Renal Disease; Chronic kidney disease; Chronic kidney failure; Chronic renal disease; chronic Kid…
DOID:5419 4689 SCHIZOPHRENIA; Schizophrenia; schizophrenia
DOID:684 3836 HCC; HEPATOCELLULAR CARCINOMA; Hepatocellular Carcinoma; Hepatocellular carcinoma; Hepatoma; hcc; hepato-cellular carcinoma; hepatocellular Carcinoma; hepatocellular carcinoma; hepatoma
DOID:5844 3664 Heart Attack; Heart attack; MYOCARDIAL INFARCTION; Myocardial Infarct; Myocardial Infarction; Myocardial infarct; Myocardial infarction; heart attack; myo-cardial infarction; myocardiaL infARction;…

7.2 Disease mentions by study.

Sort synonyms terms by frequency.

Disease mentions by study (Random sample of studies)
nct_id doid N_mentions disease_terms
NCT00448669 DOID:526 2 HIV infection
NCT00635674 DOID:0050848 11 OSA;obstructive sleep apnea
NCT00635674 DOID:0014667 6 metabolic syndrome
NCT00635674 DOID:1936 2 atherosclerosis
NCT00635674 DOID:10763 1 hypertension
NCT00775606 DOID:11405 1 diphtheria
NCT00775606 DOID:11338 1 tetanus
NCT01118520 DOID:7693 4 AAA;abdominal aortic aneurysm
NCT01169051 DOID:1324 2 lung cancer
NCT01169051 DOID:162 2 cancer
NCT01169051 DOID:0060224 1 atrial fibrillation
NCT01169051 DOID:0060145 1 analgesia
NCT01169051 DOID:5041 1 esophageal cancer
NCT01169051 DOID:1612 1 breast cancer
NCT01285219 DOID:1227 3 neutropenia
NCT01673893 DOID:5844 2 Myocardial Infarction;myocardial infarct
NCT01693484 DOID:9351 1 diabetes
NCT01693484 DOID:848 1 arthritis
NCT01693484 DOID:178 1 vascular disease
NCT01927887 DOID:3310 1 allergic
NCT01927887 DOID:2355 1 anemia
NCT01927887 DOID:1781 1 thyroid cancer
NCT01927887 DOID:1205 1 allergy
NCT03300830 DOID:0111157 5 Castleman disease
NCT03300830 DOID:162 4 cancer
NCT03300830 DOID:0111152 1 Multicentric Castleman Disease
NCT03300830 DOID:8632 1 Kaposi sarcoma

8 JensenLab Tagger targets NER (human proteins)

Many false positives due to synonomy collisions with common words (e.g. “Aim 1”, “Nut”).

## [1] "Total target mentions: 556908 (in 124421 studies)"

8.1 Target mention totals by merging to resolved Ensembl ID (ENSP).

Top 20 human proteins by total mentions
ensp N_mentions terms
ENSP00000380432 19677 I-NS; insulin; Insulin; ins; Ins; INSULIN; INS; 1 FU 2; 1HI T; inSUlin; INs; INsulin; 3 in C; InS; InsuLin
ENSP00000376823 14422 MRI; MRi; MRI 2; mri; MRI2; MR I; MRI_2; MRI 2; MRI-2
ENSP00000255030 4513 C-Reactive protein; CRP; C-reactive protein; C reactive protein; C Reactive Protein; c reactive protein; c-reactive protein; C-Reactive Protein; CRp; crp; C-reactive Protein; CrP; C - reactive prot…
ENSP00000225474 4343 filgrastim; granulocyte colony-stimulating factor; G-CSF; granulocyte-colony stimulating factor; Filgrastim; GCSF; granulocyte colony stimulating factor; Granulocyte-colony stimulating factor; Gran…
ENSP00000275493 4124 EGFR; epidermal growth factor receptor; eGFR; HER1; Epidermal growth factor receptor; ERBB; Men A; HER-1; MenA; erythroblastic leukemia viral; Epidermal Growth Factor receptor; ERBB1; e-GFR; Epider…
ENSP00000478570 4016 VEGF; vascular endothelial growth factor; Vascular Endothelial Growth Factor; Vascular endothelial growth factor; VEGF family; vascular-endothelial growth factor; 1 mkg; vegf; vascular endothelial …
ENSP00000398698 3661 Tumor Necrosis Factor; Tumor necrosis factor; TNF; TNF alpha; tumor necrosis factor; TNFalpha; TNF-alpha; DIF; TNFa; Dif; TNFA; TNF-a; Tumor necrosis Factor; TNF-Alpha; dif; TNF Alpha; TNF-A; tumor…
ENSP00000011653 3611 CD4; CD4-receptor; CD 4; 3 CD4; CD4 receptor; CD4 molecule; CD-4
ENSP00000314151 3480 PSA; prostate specific antigen; PsA; prostate-specific antigen; aPS; Prostate Specific Antigen; Prostate specific antigen; APs; ApS; Prostate specific Antigen; APS; Prostate-Specific Antigen; Prost…
ENSP00000452780 3459 1 of 2; 5 men; 1- Age; beta-2 microglobulin; 3 MRI; 3 to -2; 3 g iv; 4 pre; 5 mEq; 2 HLA; B2M; beta 2-microglobulin; beta2-microglobulin; 1 HLA; 3 Low; beta2 microglobulin; 3 low; 5 meq; 5mEq; 5 in…
ENSP00000327246 3240 1 of 2; 3 - HCV; VIPR; 3 HCV; 1-of-2; 1of 2
ENSP00000226730 3152 Interleukin-2; IL-2; aldesleukin; IL2; interleukin 2; interleukin-2; interleukin2; hIL2; Il-2; Aldesleukin; IL - 2; Interleukin 2; interleukin—2; ALDESLEUKIN; I L-2; T cell growth factor; lymphok…
ENSP00000313950 3060 Aim 1; AIM 1; aim 1; Aim1; Aim 1; Aim-1; AURORA 1; aim1; AIM1; Aurora B; aim 1; Aim- 1
ENSP00000296589 3051 Aim 1; AIM 1; aim 1; Aim1; Aim 1; Aim-1; aim1; AIM1; SLC45A2; aim 1; Aim- 1
ENSP00000358062 3047 Aim 1; AIM 1; aim 1; Aim1; Aim 1; Aim-1; aim1; AIM1; aim 1; Aim- 1; ST4
ENSP00000385675 2957 interleukin -6; IL-6; IL6; Interleukin 6; interleukin-6; Interleukin-6; interleukin 6; CDF; HGF; Interleukin- 6; IL- 6; BSF-2; IL 6; Il-6; il-6; Interleukin - 6; IL—6; InterLeukin-6; interleukin-…
ENSP00000269571 2877 HER2; human epidermal growth factor receptor 2; HER-2; ErbB2; Human Epidermal Growth Factor Receptor 2; Her2/Neu; human epidermal growth factor receptor-2; ERBB2; erb-b2 receptor tyrosine kinase 2;…
ENSP00000387662 2872 GLP-1; glucagon; glucagon-like peptide-1; GLP1; Glucagon-like peptide-1; Glucagon; glucagon-like peptide 1; glucagon like peptide-1; glucagon-like-peptide-1; glucagon-like peptide 2; GLP2; glucagon…
ENSP00000295897 2851 albumin; serum albumin; Albumin; Serum Albumin; Serum albumin; HSA; alb; ALB; Alb; hsa; ALbumin; 2b XL
ENSP00000357112 2757 Aim 2; AIM 2; aim 2; Aim-2; AIM2; Aim2

8.2 Target mentions by study.

Sort synonyms terms by frequency.

Target mentions by study (Random sample of studies)
nct_id ensp N_mentions target_terms
NCT00238043 ENSP00000252723 4 Epoetin
NCT00702195 ENSP00000410257 1 IVF
NCT00731159 ENSP00000410257 1 IVF
NCT00960518 ENSP00000309968 2 TACE
NCT01168609 ENSP00000333203 4 PCI
NCT01168609 ENSP00000216714 1 apex
NCT01168609 ENSP00000317780 1 Cox
NCT01168609 ENSP00000321260 1 Cox
NCT01215994 ENSP00000343656 3 GFR
NCT01215994 ENSP00000381448 2 cystatin C
NCT02134977 ENSP00000328236 1 rod
NCT02134977 ENSP00000356520 1 RHa
NCT02134977 ENSP00000405330 1 estrogen receptor
NCT02383355 ENSP00000263686 2 CD62P;P-selectin
NCT02383355 ENSP00000265316 1 ABC
NCT02671604 ENSP00000265970 2 CPK
NCT02671604 ENSP00000215882 1 cTp
NCT02671604 ENSP00000263100 1 ABG
NCT02671604 ENSP00000320117 1 STP
NCT02671604 ENSP00000348019 1 AST
NCT02671604 ENSP00000367038 1 HES
NCT02671604 ENSP00000378972 1 STP
NCT03254264 ENSP00000370546 4 ASD

9 Enumerate study-drug-disease-target links.

And include references.

Since each study may be associated with multiple drugs, targets and diseases, we build a table of all associated combinations, then aggregate by study (NCT_ID). For DOIDs with multiple terms, keep only most common term for simplicity.

## [1] "study-disease links: 237415"

9.3 PubChem molecules to ChEMBL targets

CID →(PubChem)→ INCHIKEY
INCHIKEY →(ChEMBL)→ MOLECULE_CHEMBL_ID
MOLECULE_CHEMBL_ID →(ChEMBL)→ ACTIVITY_ID

## [1] "CIDs: 3783 ; INCHIKEYs: 3781 ; pairs: 3783"
## [1] "INCHIKEYs: 3314 ; MOLECULE_CHEMBL_IDs: 3314 ; pairs: 3316"
## [1] "MOLECULE_CHEMBL_IDs: 2302 ; TARGET_CHEMBL_IDs: 3877 ; ACTIVITY_IDs: 127943 ; DOCUMENT_CHEMBL_IDs: 16959"
## [1] "CID2UNIPROT links: 27008 ; CIDs: 2112 ; UNIPROTs: 2521"

9.4 TDL counts

TDL counts
idgTDL N
Tchem 707
Tclin 324
Tbio 93
Tdark 2

9.5 PubMed references from AACT studies.

## [1] "Study references: 388031 ; PMIDs: 287758 ; studies: 61208"

9.6 PubMed references from ChEMBL activities.

ACTIVITY_ID →(ChEMBL)→ DOCUMENT_CHEMBL_ID
DOCUMENT_CHEMBL_ID →(ChEMBL)→ PUBMED_ID

## [1] "DOCUMENT_CHEMBL_IDs:: 16198 ; PMIDs: 15193"

10 Aggregating, scoring and ranking disease, target associations.

Evidence weighted by:


Powered by Rmarkdown.